Exploration Data Analysis on S18 Sheep

Load libraries and aggregated data

#Double check generated data is consistent with data dictionary

aggregate_data %>% filter(sheep == "S18") %>% 
  filter(WaveFront == "SR",Catheter_Type == "Penta", Point_Number == 4815) %>%
  select(signal) %>%
  unlist() %>%
  plot(type = "l")

Coincidently, the raw data looks like

aggregate_data %>% filter(sheep == "S18") %>% 
  filter(WaveFront == "SR",Catheter_Type == "Penta", Point_Number == 4815) %>%
  select(rawsignal) %>%
  unlist() %>%
  plot(type = "l")

High level split of count of observations with and without histology info. Note - The histology data is sometimes blank and a signal is sometimes not present. Generally the number of observations that have histology labels (0 or 1) are outnumber by those that dont, particularly with S20.

aggregate_data %>% group_by(sheep,Catheter_Type,WaveFront) %>% 
  summarise(histology_count = sum(!is.na(endocardium_scar)),
            no_histology_count = sum(is.na(endocardium_scar))) %>% 
  arrange(sheep,desc(histology_count))
## `summarise()` has grouped output by 'sheep', 'Catheter_Type'. You can override
## using the `.groups` argument.
## # A tibble: 18 × 5
## # Groups:   sheep, Catheter_Type [6]
##    sheep Catheter_Type WaveFront histology_count no_histology_count
##    <fct> <fct>         <fct>               <int>              <int>
##  1 S12   Penta         SR                    709                574
##  2 S12   Penta         RVp                   666                511
##  3 S12   Penta         LVp                   291                206
##  4 S15   Penta         SR                    792               1768
##  5 S15   Penta         LVp                   402               1648
##  6 S15   Penta         RVp                   390               1333
##  7 S17   Penta         SR                    456                501
##  8 S17   Penta         LVp                   391                652
##  9 S17   Penta         RVp                   391                769
## 10 S18   Penta         SR                   1329               1270
## 11 S18   Penta         LVp                  1143               1200
## 12 S18   Penta         RVp                   977                990
## 13 S20   Penta         LVp                   859               1589
## 14 S20   Penta         SR                    582               1008
## 15 S20   Penta         RVp                   478               1090
## 16 S9    Penta         RVp                   607                962
## 17 S9    Penta         LVp                   577                712
## 18 S9    Penta         Ap                    511                958

Looking at cleaned data only, some sheep (s20 SR wavelenth) have more imbalanced scar v no scar labels

cleaned_aggregate_data %>% filter(!is.null(signal)) %>%
  filter(!is.na(endocardium_scar)) %>% 
  group_by(sheep,Catheter_Type,WaveFront, Categorical_Label) %>% 
    summarise(count = n())
## `summarise()` has grouped output by 'sheep', 'Catheter_Type', 'WaveFront'. You
## can override using the `.groups` argument.
## # A tibble: 36 × 5
## # Groups:   sheep, Catheter_Type, WaveFront [18]
##    sheep Catheter_Type WaveFront Categorical_Label count
##    <fct> <fct>         <fct>     <chr>             <int>
##  1 S12   Penta         LVp       NoScar              100
##  2 S12   Penta         LVp       Scar                191
##  3 S12   Penta         RVp       NoScar              209
##  4 S12   Penta         RVp       Scar                457
##  5 S12   Penta         SR        NoScar              188
##  6 S12   Penta         SR        Scar                521
##  7 S15   Penta         LVp       NoScar              158
##  8 S15   Penta         LVp       Scar                244
##  9 S15   Penta         RVp       NoScar              152
## 10 S15   Penta         RVp       Scar                238
## # ℹ 26 more rows

The available data reduces as not all data points have signal info along with not all points having histology info (blanks in cleaned_histology_all file).

We feed both the filtered (blanks in cleaned_histology_all and no signal) and imputed (blanks are treated as zeros in labells) to the orange data mining analysis.

The proportion of Scar and NoScar “roughly” balanced.

aggregate_data %>% filter(!is.null(signal)) %>% count(Categorical_Label)
## # A tibble: 2 × 2
## # Rowwise: 
##   Categorical_Label     n
##   <fct>             <int>
## 1 Scar              16201
## 2 NoScar            13091

Notably, even though the S20 sheep is the control subject, it still has a significan portion of scar (2K) versus no-scar (3.6K). See below:

aggregate_data %>% filter(!is.null(signal)) %>% group_by(sheep) %>% count(Categorical_Label)
## # A tibble: 12 × 3
## # Groups:   sheep [6]
##    sheep Categorical_Label     n
##    <fct> <fct>             <int>
##  1 S12   Scar               1969
##  2 S12   NoScar              988
##  3 S15   Scar               3365
##  4 S15   NoScar             2968
##  5 S17   Scar               1654
##  6 S17   NoScar             1506
##  7 S18   Scar               5040
##  8 S18   NoScar             1869
##  9 S20   Scar               1929
## 10 S20   NoScar             3677
## 11 S9    Scar               2244
## 12 S9    NoScar             2083

Same thing but by group

aggregate_data %>% filter(!is.null(signal)) %>% group_by(sheep,Catheter_Type,WaveFront) %>% 
  count(Categorical_Label)
## # A tibble: 36 × 5
## # Groups:   sheep, Catheter_Type, WaveFront [18]
##    sheep Catheter_Type WaveFront Categorical_Label     n
##    <fct> <fct>         <fct>     <fct>             <int>
##  1 S12   Penta         LVp       Scar                320
##  2 S12   Penta         LVp       NoScar              177
##  3 S12   Penta         RVp       Scar                819
##  4 S12   Penta         RVp       NoScar              358
##  5 S12   Penta         SR        Scar                830
##  6 S12   Penta         SR        NoScar              453
##  7 S15   Penta         LVp       Scar               1079
##  8 S15   Penta         LVp       NoScar              971
##  9 S15   Penta         RVp       Scar                862
## 10 S15   Penta         RVp       NoScar              861
## # ℹ 26 more rows

Distribution of signal length assuming there is a signal. Reflects manual calibration of starting point for window of interest.

summary_window_length <- aggregate_data %>% 
  mutate(length_window = To - From) %>% filter(length_window != 0) %>% 
  group_by(WaveFront, Categorical_Label) %>% select(length_window) 
## Adding missing grouping variables: `WaveFront`, `Categorical_Label`
summary_window_length %>%
  plot_ly(x = ~length_window, type = "histogram") %>%
  layout(barmode = "overlay", 
         xaxis = list(title = "Window of interest lenth"),
         yaxis = list(title = "Frequency"),
         title = "Histogram of Distinct Windows of Interest",
         facet_row = ~WaveFront,
         facet_col = ~Categorical_Label)
## Warning: 'layout' objects don't have these attributes: 'facet_row', 'facet_col'
## Valid attributes include:
## '_deprecated', 'activeshape', 'annotations', 'autosize', 'autotypenumbers', 'calendar', 'clickmode', 'coloraxis', 'colorscale', 'colorway', 'computed', 'datarevision', 'dragmode', 'editrevision', 'editType', 'font', 'geo', 'grid', 'height', 'hidesources', 'hoverdistance', 'hoverlabel', 'hovermode', 'images', 'legend', 'mapbox', 'margin', 'meta', 'metasrc', 'modebar', 'newshape', 'paper_bgcolor', 'plot_bgcolor', 'polar', 'scene', 'selectdirection', 'selectionrevision', 'separators', 'shapes', 'showlegend', 'sliders', 'smith', 'spikedistance', 'template', 'ternary', 'title', 'transition', 'uirevision', 'uniformtext', 'updatemenus', 'width', 'xaxis', 'yaxis', 'barmode', 'bargap', 'mapType'

Shows distinct windows of interest by sheep.

 aggregate_data %>% filter(!is.na(Categorical_Label)) %>% 
   mutate(length_window = To - From) %>% group_by(sheep) %>% filter(length_window != 0) %>% 
  summarise(mean_window = mean(length_window,na.rm = T))
## # A tibble: 6 × 2
##   sheep mean_window
##   <fct>       <dbl>
## 1 S12          128.
## 2 S15          149.
## 3 S17          136.
## 4 S18          125.
## 5 S20          148.
## 6 S9           257

See save_plots.R for plots of signals by WaveFront and Sheep